Skip to main content

Entity Taxonomy Validation and Refinement

Executive Summary: Our review finds that the current 9‑type taxonomy (identity, automation, connection, credential, owner, role, permission, resource, execution_evidence) largely covers core concepts, but needs refinements. For example, “automation” is better split into application vs execution/job to match industry terms. The owner entity should support human, group/team, and org subtypes (per NIST 800‑63’s inclusive identity model【1†L12-L17】). We also identify missing types (e.g. service_account, managed_identity, ephemeral_session, token_exchange, federation_trust, policy_statement, resource_hierarchy, materialized_edge, evidence_pack, connector_instance). We provide a corrected ER diagram, entity tables, real-world AI workflow examples, and scope-drift scenarios (e.g. an AI agent originally fetching public data later scraping PII). We outline schema changes (including remapping an “autonomous_identity” to the correct new type) and a migration roadmap with milestones. The analysis and recommendations are grounded in OAuth/OIDC and NIST standards (800-63/53), and vendor docs (AWS IAM, Azure AD, Kubernetes, etc).

Entity Taxonomy Assessment

  • Identity vs Credential: Identities (users, apps, machines) and credentials (secrets, tokens, certs) must be strictly separated. For example, OAuth 2.0 clearly distinguishes an application’s identity (client ID) from its credentials (client secret or token)【6†L12-L19】. We find one anomaly: connectors treat a GitHub PAT as an IdentitySubtype, but PATs are credentials. We recommend removing “pat” from identity subtypes and ensuring all authentication artifacts (API keys, certificates, tokens) map to the credential entity.
  • Automation vs Application/Execution: The term “automation” is ambiguous. Industry uses “application/service” or “job” for automated logic. We suggest splitting it into Application (defines what runs – e.g. a script or process) and Execution (Job) (the event of running it, which might carry context). In other words, separate the definition of automation (Flow, Business Rule) from its runtime invocation (execution_trace). This aligns with OAuth’s separation of apps vs tokens【6†L12-L19】.
  • Owner as Group/Org: Current owner covers a human user. It should extend to groups/teams/business units. NIST 800-63 recognizes that identity can be an organization or device【1†L12-L17】. Thus owners may be entire teams or OU’s. For example, a ServiceNow sys_user or an Azure AD group could be owner of flows. Include owner_type with values {User,Group,OrgUnit,Team}.
  • Additional Entity Types: The existing 9 types omit important concepts. We propose:
    • ServiceAccount/ManagedIdentity: Subtypes of identity for cloud principals with special lifecycle. E.g. AWS IAM user vs AWS service-linked role, Azure Managed Identity (no credential managed), GCP Service Account (unique email). Purpose: represent non-human principals with built-in integration.
    • EphemeralSession: Represents a short-lived session token (e.g. AWS STS session, Kubernetes ServiceAccount token, GitHub OIDC session). Purpose: model delegation chains.
    • TokenExchange: A special credential subtype capturing multi-hop auth (per RFC 8693【6†L12-L19】). Attributes: original_token, exchange_target, scopes.
    • FederationTrust: Captures an OIDC/SAML trust config (e.g. GitHub’s OIDC provider config in AWS). Purpose: treat a configured trust as an authenticator (Credential subtype with issuer, audience, thumbprints)【10†L6-L11】.
    • PolicyStatement: Normalizes a policy rule (source of a permission). E.g. an AWS inline policy or Azure RBAC role definition. Attributes: effect, actions, resource_pattern, condition (ABAC)【10†L6-L11】.
    • ResourceHierarchy: Represents hierarchical structure (e.g. a cloud project or Snowflake account). Links resources for multi-tenancy/compliance.
    • MaterializedEdge: Optionally store each computed reachability path (flattened graph edge) for performance.
    • EvidencePack: Packages of immutable findings (sealed with hash/signature) archived for audit.
    • ConnectorInstance: Metadata about each connector sync (its last run, rate-limit status, tenant_id). Useful for multi-tenancy.

Each entity should include a trust_boundary (e.g. tenant or trust_domain) and source_scope (source_system+region/account) to separate multi-tenant data【10†L6-L11】【1†L12-L17】.

Entity-Relationship Diagram and Schema Table

Entity TypePurpose & ExamplesKey Attributes (min schema)Lifecycle EventsRelationshipsAudit/Evidence
IdentityPrincipals (user/app/service){ id, type(human/bot/service), source_scope, lifeCycleStatus, attributes }created, credential rotation, disabled/deletedHAS_ROLE→Role, OWNED_BY→Owner, ACTS_AS→(execution/evidence)Logins, actions by identity
ApplicationDefines automated logic{ id, name, owner, source_scope, config, triggers }created, modified, deprecatedRUNS_EXEC→Execution, USES_CONN→ConnectionDeployment, config-change events
ExecutionRunning instance (job/event){ id, app_id, start_time, end_time, status, actor_identity }started, completed, failedUSES_CRED→Credential, APPLIED_PERM→Permission, RELATED_EVIDENCE→EvidencePackExecution logs, audit trail
ConnectionExternal integration config{ id, endpoint, type, auth_method, tenant_id }created, updated, retiredUSES_CRED→Credential, INVOKED_BY→ApplicationAPI call logs, error logs
CredentialAuth material (token/key/etc){ id, kind, issuer, subject, expiry, scopes, secret_ref }issued/rotated, revokedAUTH_AS→Identity, AUTH_FOR→Application, AUTH_AS_EXEC→ExecutionIssue/revoke logs
OwnerAccountability (person/team){ id, owner_type(User/Group/Team/Org), contact, dept }added/removed, role changeOWNS→Identity/Application/ConnectorInstanceOwnership change logs
RolePermission set{ id, name, description, source_scope }created, updated, retiredGRANTS→Permission, ASSIGNED_TO→Identity/OwnerRole assignment/removal audit
PermissionFine-grained action (ABAC){ id, action, resource_pattern, effect, condition }added, removed from roleAPPLIES_TO→Resource, GRANTED_BY→RolePolicy change logs
ResourceData/object target{ id, type, path, sensitivity, source_scope }created, moved, archivedPROTECTED_BY→Permission, TOUCHED_BY→ExecutionData access logs
ExecutionEvidenceProof artifacts/logs{ id, execution_id, timestamp, data_hash }loggedEVIDENCES→ExecutionImmutable store of logs
ConnectorInstanceSync metadata{ id, type, tenant_id, last_sync, status }sync_started, sync_completed(links to Application/Owner)Sync logs, rate-limit records
FederationTrustTrust config (OIDC/SAML){ id, issuer, jwks_uri, audiences, thumbprint }created, rotated, revokedTRUSTED_BY→Credential/ConnectionConfig change logs
PolicyStatementIAM policy fragment{ id, source_system, effect, principals, actions, resources, conditions }created, updated, deletedGENERATED_FROM→ConnectorInstance, EXPANDS→PermissionPolicy change audit
EvidencePackImmutable finding bundle{ id, creation_time, hash, signer }created, archivedCONTAINS→ExecutionEvidence+Permission+ResourceWORM storage, digital signatures
ResourceHierarchyOrg/tenant structure{ id, level, parent_id, region }added, reorganizedCONTAINS→ResourceGovernance change logs

Key: source_scope ≈ (cloudAccount/tenant/cluster ID), trust_boundary mirrors it. The above schema fields capture identity type, credentialing, lifecycle state, policy conditions, multi-tenancy, and audit needs in line with standards (OAuth, SCIM/SAML) and NIST/CIS controls.

Real-World Autonomous Examples

ScenarioEntity Graph (Key Entities & Relations)Notes
GitHub Action + AWS – A CI workflow uses GitHub OIDC to assume an AWS role.
Actors: GH App (identity), OIDC token (credential), AWS Role (identity), CI Job (execution).
Relationships: GH App HAS_CRED GH OIDC token; GitHub Action USES_CRED to get AWS STS ASSUMES_ROLE into AWS Role; Execution APPLIED_PERM on AWS resources.
Identity: GitHub App, AWS Role
Credential: OIDC token, AWS session token
Permission: AWS IAM policy
Resource: S3 bucket
Owner: DevOps Team
Example of federated auth (RFC8693【6†L12-L19】).
Azure Flow + OpenAI – A Logic App triggers when new Entra group member added, calls LLM for summarization.
Entities: Flow definition (application), Entra SP (identity), OIDC token (cred), LLM endpoint (resource).
Relationships: Flow RUNS_EXEC; Flow USES_CONN to OpenAI endpoint; Flow’s SP USES_CRED to acquire token; Execution evidence shows API call.
Identity: Azure Service Principal
Credential: Entra managed identity token
Permission: OpenAI API scope
Resource: Custom data table, OpenAI endpoint
Owner: IT Automation Team
Demonstrates AI-assisted automation.
K8s CronJob + Vault – A nightly job retrieves secrets from Vault then writes to DB.
Entities: Kubernetes ServiceAccount (identity), TLS cert (cred), CronJob (execution), DB table (resource).
Relationships: SA HAS_CRED TLS cert; CronJob USES_CRED to authenticate; Execution evidence: K8s audit logs, DB logs.
Identity: K8s ServiceAccount
Credential: TLS client cert
Permission: DB write privilege
Resource: Database table
Owner: DevOps
Illustrates infrastructure automation.
AI Data Pipeline – An ML pipeline first processes public data, later updated to fetch customer PII for personalization.
Entities: Pipeline app, API tokens (credential), data lake (resource), LLM model (resource).
Relationships: Pipeline USES_CRED for data source; Execution initially TOUCHED public DB, later TOUCHED customer PII table using same app identity.
Identity: Pipeline service
Credential: API key for data lake
Permission: Data lake SELECT permission
Resource: Public vs PII dataset
Owner: Data Science Team
Scope drift: logic didn’t change identity or token, but data sensitivity rose.

(More examples: Terraform automation, Databricks jobs accessing Unity Catalog, ServiceNow business rule invoking external ML API, AD group rule provisioning accounts, etc.)

Scope-Drift Scenarios

  1. Data Sensitivity Creep: An AI agent initially trained on public records now ingests PII without code change.

    • Detection: Audit logs show the same automation→resource link changed from non-PII table to PII table. A sensitivity label mismatch triggers a finding (CIS Control 03 on data classification).
    • Remediation: Quarantine the automation, review its data access policy, and rotate its credentials if needed. Enforce stricter RBAC so LLM calls only access explicitly allowed columns (NIST AC-6【10†L6-L11】).
  2. Unauthorized Scope Expansion: A serverless job originally granted “read_sales_data” permission later also runs a “export_customer_list” procedure.

    • Detection: Compare permissions used in each execution over time. The Permission entity now includes a new action (export_customer_list) not present in initial deploy.
    • Remediation: Flag deviation, require dev-team approval, and remediate the IAM role. Use a “deny-by-default” guardrail or OPA policy (NIST SP 800-162 ABAC principles).
  3. Hidden Delegation: A microservice began using a new downstream API (triggered by an AI recommendation) it wasn’t originally authorized for.

    • Detection: Materialized edges show an unexpected Execution→Resource link. Verify if the connecting Credential and trust chain (token_exchange) were legitimate.
    • Remediation: Invalidate the exchanged token, tighten trust relations (e.g. remove JWT audience), and require new provisioning for the service principal.

Schema Changes & Migration

  • Rename & Subtype: Rename automationApplication/Definition. Introduce subtype Execution for runs. Example: treat ServiceNow Flow definition as Application, its run instance as Execution.
  • Expand owner: Add owner_type (User/Group/Team/Org). Map human_identity to User, and model Azure AD groups or ServiceNow department as owners too.
  • Remap autonomous_identity: Existing data where NormalizedNodeType=autonomous_identity should be split by subtype. In migration, examine each row’s subtype: if it’s a script, map to Application; if it’s a robot account, map to Identity with type=service_account.
  • Introduce new types: Update schema to include the types above (service_account, token_exchange, etc). For example, model an AWS STS session as an EphemeralSession with trust_boundary = AWS account ID.
  • Tests: Write integration tests using sample metadata: e.g. ingest a GitHub OIDC workflow, verify it creates an Identity (GitHub App) and a Credential (OIDC token) linked via token_exchange. Ensure an LLM call from a dummy automation yields correct Resource labeling and evidence.

Migration Plan:

  1. Schema Extension: Add new tables/collections for the new entities (FederationTrust, PolicyStatement, etc).
  2. Data Migration: In a maintenance window, run a script to transform autonomous_identity records and fill owner_type. Validate by comparing ID counts pre/post.
  3. Connector Updates: Adjust connectors to emit owner groups. For example, Azure AD connector should emit both user and group owners.
  4. Testing: Use synthetic scenarios (see above) to verify scope drift detection. Check queries like “find executions accessing PII resources after initial timestamp”.

Recommendations

  • Security: Enforce least privilege by design. Model and audit policy conditions (AWS IAM conditions, ABAC) explicitly【10†L6-L11】. Use signed EvidencePacks to prevent tampering (aligns with audit requirements【14†L】).
  • Operational: Enhance connector reliability: handle rate limits and partial failures by marking incomplete syncs (evidence_completeness flags). Regularly rotate credentials and review federation trusts.
  • Product: Improve UI to clarify entity taxonomy. Label “automation” as “Application/Flow” and expose owner hierarchies. Provide automated scope-drift alerts (e.g. if a run’s target resource sensitivity exceeds baseline).
  • Governance: Ensure cross-domain testing: CI pipelines should include tests where a bot’s data access is varied to simulate scope drift. Maintain documentation linking entity types to standards (e.g. OAuth2, SCIM schemas).

This analysis assumes no further constraints beyond enterprise best practices and zero-trust principles【1†L12-L17】【6†L12-L19】. All major cloud and SaaS scenarios have been considered, and the revised model is aligned with standards (OAuth2/OIDC, SAML, SCIM, NIST 800-63/53, CIS, CSA).


Next Action

Status: adopted — shipped External validation confirmed entity model and authority path approach. Findings incorporated into data model hardening in 01-data-model.md. No further action required.